Esta página contiene el código para generar análisis de redes personales (ego networks) en Twitter.
library(rtweet)
source("createTokens.R") ## keys y tokens privados
source("rtweet_functions.R") ## funciones para trabajar con múltiples tokens
library(tidyverse)
library(igraph)
library(tidygraph)
library(ggraph)
library(ggwordcloud)
library(tidytext)
theme_set(theme_custom())
El primer paso consiste en escoger un usuario focal (o “ego”) a partir del cual construímos una red personal.
ego <- "acastroaraujo" # Yo
ego_info <- lookup_users(ego, token = sample(token, 1))
ego_info$followers_count
## [1] 786
Nombre: andrés castro araújo
Usuario: acastroaraujo
Seguidores: 786
Amigos: 833
Se unió a Twitter en 2010-05-10 04:54:27
Este análisis está dividido en tres partes.
Cada una de estas tres dimensiones corresponde a flujos de interacción diferentes. La primera consiste de los usuarios que reciben información de acastroaraujo, la segunda son los usuarios que generan la información recibida por acastroaraujo, y la tercera consiste en los usuarios donde el flujo de información es recíproco.
Este código es de acceso libre excepto por los keys y tokens privados que se consiguen abriendo una cuenta de desarrollador en https://developer.twitter.com/
El siguiente código extrae la lista de seguidores de acastroaraujo (cada uno identificado con un user_id).
ego_followers <- get_followers(ego, token = sample(token, 1))
ego_followers
## # A tibble: 786 x 1
## user_id
## <chr>
## 1 53187962
## 2 90899451
## 3 168368279
## 4 1197545169341468672
## 5 363863237
## 6 892820477747572738
## 7 26968974
## 8 1015741492411891713
## 9 3081449317
## 10 615478352
## # … with 776 more rows
Este user_id es exclusivo a cada cuenta, incluso cuando el usuario decide cambiar su nombre.
El siguiente código crea una carpeta llamada *_friends_of_followers/ donde queda archivado la lista de los seguidores de cada uno de estos usuarios.
Dependiendo del número de usuarios y el número de Tokens, esto puede llegar a demorarse varias horas (o incluso días).
outfolder <- paste0(ego, "_friends_of_followers/")
if (!dir.exists(outfolder)) dir.create(outfolder)
users_done <- str_replace(dir(outfolder), ".rds", "")
users_left <- setdiff(ego_followers$user_id, users_done)
while (length(users_left) > 0) {
new_user <- users_left[[1]]
friends_of_user <- try(multi_get_friends(new_user, token))
file_name <- str_glue("{outfolder}{new_user}.rds")
write_rds(friends_of_user, file_name, compress = "gz")
users_left <- users_left[-which(users_left %in% new_user)] ## int. subset
}
Para algunos usuarios esta información es imposible de conseguir porque son cuentas protegidas.
En este caso, no se puede obtener información sobre el 6.4% de los sequidores de acastroaraujo.
Para construir la red, tomamos toda la lista de usuarios y sus amigos y los organizamos en dos columnas, donde cada fila indica un usario (from) siguiendo a otro usario (to).
edge_list <- list.files(outfolder, full.names = TRUE) %>%
map(read_rds)
edge_list <- edge_list[-error_index] %>%
bind_rows()
edge_list
## # A tibble: 1,397,826 x 2
## from to
## <chr> <chr>
## 1 1001194679977893888 106228188
## 2 1001194679977893888 303862998
## 3 1001194679977893888 53279593
## 4 1001194679977893888 89109653
## 5 1001194679977893888 47514423
## 6 1001194679977893888 91831163
## 7 1001194679977893888 150638911
## 8 1001194679977893888 405729991
## 9 1001194679977893888 350926847
## 10 1001194679977893888 4853185695
## # … with 1,397,816 more rows
Aquí hay 1,397,826 conexiones. Sin embargo, aquí están incluídos conexiones on usuarios más allá de los que siguen a acastroaraujo.
ego_followers_info <- lookup_users(ego_followers$user_id, token = sample(token), 1)
write_rds(ego_followers_info, paste0(ego, "_follower_info.rds"), compress = "gz")
También podemos conseguir metadatos sobre cada usuario.
ego_followers_info <- read_rds(paste0(ego, "_follower_info.rds")) %>%
filter(!protected) %>%
select(
user_id, screen_name, lang, name, location, description,
ends_with("count"), -starts_with("quote"),
-starts_with("retweet"), -reply_count,
-starts_with("fav")
) %>%
rename(name = user_id, user_name = name)
id_dict <- ego_followers_info %>%
select(name, screen_name) %>%
deframe()
Por ejemplo, esta es la información que corresponde a los seguidores de acastroaraujo con mayor número de seguidores.
ego_followers_info %>%
arrange(desc(followers_count)) %>%
select(screen_name, description, location, followers_count, friends_count)
## # A tibble: 730 x 5
## screen_name description location followers_count friends_count
## <chr> <chr> <chr> <int> <int>
## 1 RodrigoUpri… "Investigador @Dejusti… "Colombia" 143076 558
## 2 Rivas_Santi… "Director/presentador … "" 132162 9233
## 3 FundarMexico "Organización plural, … "Ciudad d… 115607 15113
## 4 CVderoux "Ex concejal de Bogotá… "Bogotá D… 114916 15145
## 5 JuanitaGoe "Representante a la Cá… "Bogotá, … 111332 7405
## 6 Bejumero "Alcalde de Bejuma 201… "Carabobo… 95466 72899
## 7 Popeye_leye… "" "Colombia" 80287 4220
## 8 Dejusticia "Centro de Estudios de… "Bogotá, … 67441 2145
## 9 JoseOMorera "#Profesor #Autor #Con… "Bogotá, … 60723 58979
## 10 Pacifistacol "Una plataforma para l… "Colombia" 50959 2208
## # … with 720 more rows
Finalmente nos interesa la red personal de seguidores de acastroaraujo, por lo cual eliminamos las conexiones entre usuarios que se encuentran por fuera de sus 786
edge_list <- edge_list %>%
filter(to %in% ego_followers_info$name) %>%
filter(from %in% ego_followers_info$name)
edge_list
## # A tibble: 20,977 x 2
## from to
## <chr> <chr>
## 1 1001194679977893888 142469128
## 2 1001194679977893888 36087400
## 3 1001194679977893888 14063051
## 4 1001194679977893888 48253393
## 5 1001194679977893888 382592033
## 6 1001194679977893888 410228042
## 7 1001194679977893888 12542002
## 8 1004030798125821953 76678975
## 9 1004030798125821953 162139278
## 10 1004030798125821953 1894875410
## # … with 20,967 more rows
La red personal de seguidores de acastroaraujo que pudimos reconstruir tiene 730 usuarios con 20977 conexiones.
ego_network <- edge_list %>%
tidygraph::as_tbl_graph() %>%
left_join(ego_followers_info) %>%
rename(name = screen_name, user_id = name) %>%
select(name, everything())
ego_network
## # A tbl_graph: 703 nodes and 20977 edges
## #
## # A directed simple graph with 1 component
## #
## # Node Data: 703 x 10 (active)
## name user_id lang user_name location description followers_count
## <chr> <chr> <chr> <chr> <chr> <chr> <int>
## 1 Buho… 100119… es Búho Sol… "" "" 42
## 2 Nava… 100403… es Esteban … "Bogotá… "Gender & … 583
## 3 Mara… 100415… <NA> Mararía … "" "" 14
## 4 Davi… 100418… <NA> David "" "" 1
## 5 jarj… 100784… en Alexande… "Bogota… "Economist… 84
## 6 JFer… 100992… en J. Ferna… "Extrem… "Hyperacti… 396
## # … with 697 more rows, and 3 more variables: friends_count <int>,
## # listed_count <int>, statuses_count <int>
## #
## # Edge Data: 20,977 x 2
## from to
## <int> <int>
## 1 1 158
## 2 1 413
## 3 1 152
## # … with 20,974 more rows
## Estadísticas descriptivas
ego_network <- ego_network %>%
mutate(
out_degree = centrality_degree(mode = "out"),
in_degree = centrality_degree(mode = "in"),
betweenness = centrality_betweenness(directed = TRUE),
authority_score = centrality_authority(),
eigen_centrality = centrality_eigen(directed = TRUE)
)
ego_network
## # A tbl_graph: 703 nodes and 20977 edges
## #
## # A directed simple graph with 1 component
## #
## # Node Data: 703 x 15 (active)
## name user_id lang user_name location description followers_count
## <chr> <chr> <chr> <chr> <chr> <chr> <int>
## 1 Buho… 100119… es Búho Sol… "" "" 42
## 2 Nava… 100403… es Esteban … "Bogotá… "Gender & … 583
## 3 Mara… 100415… <NA> Mararía … "" "" 14
## 4 Davi… 100418… <NA> David "" "" 1
## 5 jarj… 100784… en Alexande… "Bogota… "Economist… 84
## 6 JFer… 100992… en J. Ferna… "Extrem… "Hyperacti… 396
## # … with 697 more rows, and 8 more variables: friends_count <int>,
## # listed_count <int>, statuses_count <int>, out_degree <dbl>,
## # in_degree <dbl>, betweenness <dbl>, authority_score <dbl>,
## # eigen_centrality <dbl>
## #
## # Edge Data: 20,977 x 2
## from to
## <int> <int>
## 1 1 158
## 2 1 413
## 3 1 152
## # … with 20,974 more rows
La siguiente gráfica muestra la influencia de cada usuario en Twitter (eje horizontal) vs la influencia de cada usuario dentro de la red personal de seguidores (eje vertical)
ego_network %>%
as_tibble() %>%
#filter(in_degree > 5) %>%
ggplot(aes(followers_count, in_degree)) +
geom_point()
ego_network %>%
as_tibble() %>%
mutate(label_name = ifelse(
test = rank(-followers_count) <= 10 | rank(-in_degree) <= 10,
yes = name,
no = NA_character_)
) %>%
ggplot(aes(followers_count, in_degree)) +
geom_point() +
ggrepel::geom_label_repel(aes(label = label_name), size = 3)
Clusters
set.seed(123)
clusters <- igraph::cluster_walktrap(graph = ego_network, steps = 7)
cluster_df <- tibble(cluster = factor(clusters$membership), name = clusters$names)
cluster_df <- cluster_df %>%
group_by(cluster) %>%
filter(n() >= 10) %>%
ungroup()
ego_network <- ego_network %>%
left_join(cluster_df)
ego_network %>%
as_tibble() %>%
arrange(desc(in_degree)) %>%
filter(!is.na(cluster)) %>%
group_by(cluster) %>%
filter(rank(-authority_score) <= 30) %>%
ggplot(aes(label = name, size = log(in_degree), color = in_degree)) +
geom_text_wordcloud_area(family = "Avenir Next Condensed") +
facet_wrap(~cluster) +
labs(title = "Seguidores prominentes en cada cluster") +
scale_color_gradient(low = "grey", high = "purple")
Tamaño de cada cluster:
ego_network %>% as_tibble() %>% count(cluster)
## # A tibble: 6 x 2
## cluster n
## <fct> <int>
## 1 2 13
## 2 4 470
## 3 6 15
## 4 8 10
## 5 12 150
## 6 <NA> 45
¿Quiénes son los usuarios que funcionan como “puentes”?
ego_network %>%
as_tibble() %>%
arrange(desc(betweenness)) %>%
select(name, description, location)
## # A tibble: 703 x 3
## name description location
## <chr> <chr> <chr>
## 1 malbarracin Santandereano en el exilio. Abogado, activista … "Bogotá"
## 2 RAKarl Colombia past + present; author of #ForgottenPe… "Tierra Fría"
## 3 JuanitaGoe Representante a la Cámara por Bogotá (2018-2022… "Bogotá, D.C., …
## 4 psanabria Public Policy & Management Scholar | Profesor e… "Latin America"
## 5 Rivas_Sant… Director/presentador de Puntos Capitales. Parte… ""
## 6 MariaPradaU Abogada, Magíster en Antropología | Climate Pro… "Bogotá, Colomb…
## 7 Dejusticia Centro de Estudios de Derecho, Justicia y Socie… "Bogotá, Colomb…
## 8 MajoAlRiv Profesora | Socióloga | Desigualdad | Ciudades … ""
## 9 EmilioLeho… Ph.D candidate @NUSociology. STS, social moveme… ""
## 10 FulanoZulu… En defensa del pequeño ahorrador e inversionist… ""
## # … with 693 more rows
cols <- c("betweenness", "in_degree", "out_degree", "followers_count", "friends_count")
ego_network %>%
as_tibble() %>%
group_by(cluster) %>%
summarize(across(all_of(cols), mean)) %>%
arrange(desc(betweenness))
## # A tibble: 6 x 6
## cluster betweenness in_degree out_degree followers_count friends_count
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2 2040. 7.92 7.85 3606. 3358.
## 2 6 1235. 10.5 10.6 265. 490.
## 3 4 1102. 35.9 35.4 4065. 2214.
## 4 12 666. 24.0 25.7 460. 745.
## 5 8 616. 4.5 5.2 905. 1434.
## 6 <NA> 609. 3.98 3.8 3934. 2815.
Dada la información anterior podemos enfocarnos en segmentos particulares de la red personal.
Por ejemplo, podemos enfocarnos exclusivamente en los usuarios que hacen parte de los grupos etiquetados con 12 y 4.
ego_network_subset <- ego_network %>%
filter(cluster %in% c(12, 4, 2, 6)) %>%
mutate(
out_degree = centrality_degree(mode = "out"),
in_degree = centrality_degree(mode = "in"),
betweenness = centrality_betweenness(directed = TRUE),
authority_score = centrality_authority(),
eigen_centrality = centrality_eigen(directed = TRUE)
)
ego_network_subset %>%
ggraph("mds") +
geom_edge_fan(alpha = 1/5, width = 1/5) +
geom_node_point(aes(fill = cluster, size = in_degree),
shape = 21, color = "white", show.legend = FALSE)
ego_network_subset %>%
as_tibble() %>%
mutate(label_id = ifelse(
test = rank(-betweenness) <= 10 |rank(-in_degree) <= 10,
yes = name,
no = NA_character_)
) %>%
ggplot(aes(betweenness, in_degree, color = cluster)) +
geom_point() +
ggrepel::geom_label_repel(aes(label = label_id), size = 3)
ego_network_subset %>%
group_by(cluster) %>%
mutate(label_name = ifelse(
test = rank(-authority_score) <= 5 | rank(-betweenness) <= 5,
yes = name,
no = NA_character_
)) %>%
ggraph("mds") +
geom_edge_fan(alpha = 1/5, width = 1/5) +
geom_node_point(aes(fill = cluster, size = betweenness),
shape = 21, color = "white", show.legend = FALSE) +
geom_node_label(aes(label = label_name),
repel = TRUE, alpha = 3/4, size = 3)
Esta sección repite el análisis anterior para la red personal de amigos de acastroaraujo
outfolder <- paste0(ego, "_friends_of_friends/")
if (!dir.exists(outfolder)) dir.create(outfolder)
ego_friends <- get_friends(ego, token = sample(token, 1))
ego_friends
## # A tibble: 833 x 2
## user user_id
## <chr> <chr>
## 1 acastroaraujo 974064791894593536
## 2 acastroaraujo 53187962
## 3 acastroaraujo 345427908
## 4 acastroaraujo 90899451
## 5 acastroaraujo 168368279
## 6 acastroaraujo 38883015
## 7 acastroaraujo 50354243
## 8 acastroaraujo 377284121
## 9 acastroaraujo 1159264860233961473
## 10 acastroaraujo 1040310675497730050
## # … with 823 more rows
users_done <- str_replace(dir(outfolder), ".rds", "")
users_left <- setdiff(ego_friends$user_id, users_done)
while (length(users_left) > 0) {
new_user <- users_left[[1]]
friends_of_user <- try(multi_get_friends(new_user, token))
file_name <- str_glue("{outfolder}{new_user}.rds")
write_rds(friends_of_user, file_name, compress = "gz")
users_left <- users_left[-which(users_left %in% new_user)] ## int. subset
}
En este caso, no se puede obtener información sobre el 2.4% de los amigos de acastroaraujo.
edge_list <- list.files(outfolder, full.names = TRUE) %>%
map(read_rds)
edge_list <- edge_list[-error_index] %>% bind_rows()
edge_list
## # A tibble: 1,243,538 x 2
## from to
## <chr> <chr>
## 1 1001511262545592320 1128081883923918848
## 2 1001511262545592320 717312300529618945
## 3 1001511262545592320 1189495990119817217
## 4 1001511262545592320 1226598133205012480
## 5 1001511262545592320 1025538004167864321
## 6 1001511262545592320 39619991
## 7 1001511262545592320 300049226
## 8 1001511262545592320 1145686112
## 9 1001511262545592320 1653264782
## 10 1001511262545592320 983470194982088704
## # … with 1,243,528 more rows
ego_friends_info <- lookup_users(ego_friends$user_id, token = token)
write_rds(ego_friends_info, paste0(ego, "_friends_info.rds"), compress = "gz")
ego_friends_info <- read_rds(paste0(ego, "_friends_info.rds")) %>%
filter(!protected) %>%
select(
user_id, screen_name, lang, name, location, description,
ends_with("count"), -starts_with("quote"),
-starts_with("retweet"), -reply_count,
-starts_with("fav")
) %>%
rename(name = user_id, user_name = name)
id_dict <- ego_friends_info %>%
select(name, screen_name) %>%
deframe()
Esta es la información que corresponde a los amigos de acastroaraujo con mayor número de seguidores.
ego_friends_info %>%
arrange(desc(followers_count)) %>%
select(screen_name, description, location, followers_count, friends_count)
## # A tibble: 845 x 5
## screen_name description location followers_count friends_count
## <chr> <chr> <chr> <int> <int>
## 1 AOC "US Representative,NY-… "Bronx + … 10430327 2902
## 2 NewYorker "Unparalleled reportin… "New York… 8969567 373
## 3 hcapriles "#YoSoyVenezolano" "Venezuel… 7180260 1915
## 4 jack "#bitcoin" "" 4996684 4537
## 5 paulkrugman "Nobel laureate. Op-Ed… "New York… 4646688 67
## 6 DAVID_LYNCH "Filmmaker. Born Misso… "Los Ange… 3337686 42
## 7 ClaudiaLopez "Primera Alcaldesa de … "Bogotá, … 2472410 2506
## 8 sarahcpr "Watch my comedy speci… "New York… 2430096 3130
## 9 fdbedout "Periodista, presentad… "Miami, F… 2277988 679
## 10 lasillavacia "La cuenta de Twitter … "Bogotá" 1278648 3248
## # … with 835 more rows
edge_list <- edge_list %>%
filter(to %in% ego_friends_info$name) %>%
filter(from %in% ego_friends_info$name)
edge_list
## # A tibble: 47,526 x 2
## from to
## <chr> <chr>
## 1 1001511262545592320 607752311
## 2 1001511262545592320 69133574
## 3 1001511262545592320 2167059661
## 4 1001511262545592320 14247789
## 5 1001511262545592320 742379544309567489
## 6 1001511262545592320 381642287
## 7 1001511262545592320 2158970839
## 8 1001511262545592320 13074042
## 9 1001511262545592320 16284661
## 10 1004030798125821953 323599188
## # … with 47,516 more rows
La red personal de seguidores de acastroaraujo que pudimos reconstruir tiene 845 usuarios con 47526 conexiones.
ego_network <- edge_list %>%
tidygraph::as_tbl_graph() %>%
left_join(ego_friends_info) %>%
rename(name = screen_name, user_id = name) %>%
select(name, everything())
## Estadísticas descriptivas
ego_network <- ego_network %>%
mutate(
out_degree = centrality_degree(mode = "out"),
in_degree = centrality_degree(mode = "in"),
betweenness = centrality_betweenness(directed = TRUE),
authority_score = centrality_authority(),
eigen_centrality = centrality_eigen(directed = TRUE)
)
ego_network
## # A tbl_graph: 842 nodes and 47526 edges
## #
## # A directed simple graph with 1 component
## #
## # Node Data: 842 x 15 (active)
## name user_id lang user_name location description followers_count
## <chr> <chr> <chr> <chr> <chr> <chr> <int>
## 1 WeAr… 100151… en We are R… "The wh… "RoCur (Ro… 22487
## 2 Nava… 100403… es Esteban … "Bogotá… "Gender & … 583
## 3 jarj… 100784… en Alexande… "Bogota… "Economist… 84
## 4 JFer… 100992… es J. Ferna… "Extrem… "Hyperacti… 396
## 5 chri… 101015… en Christin… "" "Sociologi… 2473
## 6 Cual… 101493… en Cualquie… "" "Founder a… 258
## # … with 836 more rows, and 8 more variables: friends_count <int>,
## # listed_count <int>, statuses_count <int>, out_degree <dbl>,
## # in_degree <dbl>, betweenness <dbl>, authority_score <dbl>,
## # eigen_centrality <dbl>
## #
## # Edge Data: 47,526 x 2
## from to
## <int> <int>
## 1 1 634
## 2 1 660
## 3 1 306
## # … with 47,523 more rows
La siguiente gráfica muestra la influencia de cada usuario en Twitter (eje horizontal) vs la influencia de cada usuario dentro de la red personal de amigos (eje vertical)
ego_network %>%
as_tibble() %>%
#filter(in_degree > 5) %>%
ggplot(aes(followers_count, in_degree)) +
geom_point()
ego_network %>%
as_tibble() %>%
mutate(label_name = ifelse(
test = rank(-followers_count) <= 10 | rank(-in_degree) <= 10,
yes = name,
no = NA_character_)
) %>%
ggplot(aes(followers_count, in_degree)) +
geom_point() +
ggrepel::geom_label_repel(aes(label = label_name), size = 3)
Clusters
clusters <- igraph::cluster_walktrap(graph = ego_network, steps = 12)
cluster_df <- tibble(cluster = factor(clusters$membership), name = clusters$names)
cluster_df <- cluster_df %>%
group_by(cluster) %>%
filter(n() >= 10) %>%
ungroup()
ego_network <- ego_network %>%
left_join(cluster_df)
ego_network %>%
as_tibble() %>%
arrange(desc(in_degree)) %>%
filter(!is.na(cluster)) %>%
group_by(cluster) %>%
filter(rank(-authority_score) <= 30) %>%
ggplot(aes(label = name, size = log(in_degree), color = in_degree)) +
geom_text_wordcloud_area(family = "Avenir Next Condensed") +
facet_wrap(~cluster) +
labs(title = "Amigos prominentes en cada cluster") +
scale_color_gradient(low = "grey", high = "purple")
Tamaño de cada cluster:
ego_network %>% as_tibble() %>% count(cluster)
## # A tibble: 2 x 2
## cluster n
## <fct> <int>
## 1 1 368
## 2 2 474
¿Quiénes son los usuarios que funcionan como “puentes”?
ego_network %>%
as_tibble() %>%
arrange(desc(betweenness)) %>%
select(name, description, location)
## # A tibble: 842 x 3
## name description location
## <chr> <chr> <chr>
## 1 RAKarl "Colombia past + present; author of #ForgottenPe… "Tierra Fría"
## 2 malbarracin "Santandereano en el exilio. Abogado, activista … "Bogotá"
## 3 infrahumano "" "Toronto"
## 4 SergioChap… "Conspiring for a #RightsBasedEconomy at @social… "Brooklyn, NY"
## 5 cblatts "@UChicago political economist studying violence… "Chicago, IL"
## 6 causalinf "Economist slacker. Can’t remember if he ever pu… "Waco, Texas"
## 7 MajoAlRiv "Profesora | Socióloga | Desigualdad | Ciudades … ""
## 8 AOC "US Representative,NY-14 (BX & Queens). In a mod… "Bronx + Queen…
## 9 Undercover… "Historian of postwar economics @CNRS/CREST @Xde… "where Delorea…
## 10 alondra "President @SSRC_org. Harold F. Linder Professor… "Gotham and Pr…
## # … with 832 more rows
cols <- c("betweenness", "in_degree", "out_degree", "followers_count", "friends_count")
ego_network %>%
as_tibble() %>%
group_by(cluster) %>%
summarize(across(all_of(cols), mean)) %>%
arrange(desc(betweenness))
## # A tibble: 2 x 6
## cluster betweenness in_degree out_degree followers_count friends_count
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2 1304. 72.4 76.5 94460. 1593.
## 2 1 1295. 35.9 30.6 95043. 1309.
ego_network_subset <- ego_network %>%
filter(!is.na(cluster)) %>%
mutate(
out_degree = centrality_degree(mode = "out"),
in_degree = centrality_degree(mode = "in"),
betweenness = centrality_betweenness(directed = TRUE),
authority_score = centrality_authority(),
eigen_centrality = centrality_eigen(directed = TRUE)
)
ego_network_subset %>%
ggraph("mds") +
geom_edge_fan(alpha = 1/5, width = 1/5) +
geom_node_point(aes(fill = cluster, size = in_degree),
shape = 21, color = "white", show.legend = FALSE)
ego_network_subset %>%
as_tibble() %>%
mutate(label_id = ifelse(
test = rank(-betweenness) <= 10 |rank(-in_degree) <= 10,
yes = name,
no = NA_character_)
) %>%
ggplot(aes(betweenness, in_degree, color = cluster)) +
geom_point() +
ggrepel::geom_label_repel(aes(label = label_id), size = 3)
ego_network_subset %>%
group_by(cluster) %>%
mutate(label_name = ifelse(
test = rank(-authority_score) <= 5 | rank(-betweenness) <= 5,
yes = name,
no = NA_character_
)) %>%
ggraph("mds") +
geom_edge_fan(alpha = 1/5, width = 1/5) +
geom_node_point(aes(fill = cluster, size = betweenness),
shape = 21, color = "white", show.legend = FALSE) +
geom_node_label(aes(label = label_name),
repel = TRUE, alpha = 3/4, size = 3)
edge_list <- list.files(paste0(ego, "_friends_of_friends/"), full.names = TRUE) %>%
map(read_rds)
error_index <- edge_list %>%
map_lgl(~ any(class(.x) == "try-error")) %>%
which()
edge_list <- edge_list[-error_index] %>% bind_rows()
edge_list_mutual <- inner_join(
edge_list,
edge_list %>% rename(from = to, to = from)
) %>%
filter(from %in% ego_followers$user_id, to %in% ego_followers$user_id) %>%
filter(from %in% ego_friends$user_id, to %in% ego_friends$user_id) %>%
filter(from %in% to, to %in% from)
mat <- edge_list_mutual %>%
mutate(n = 1) %>%
tidytext::cast_sparse(from, to, n) %>%
as.matrix()
mat <- mat[colnames(mat), ]
mutual_network <- mat %>%
graph_from_adjacency_matrix(mode = "undirected") %>%
tidygraph::as_tbl_graph()
mutual_network
## # A tbl_graph: 297 nodes and 3319 edges
## #
## # An undirected simple graph with 2 components
## #
## # Node Data: 297 x 1 (active)
## name
## <chr>
## 1 148507300
## 2 362101597
## 3 880165015780827136
## 4 813222199
## 5 48253393
## 6 77232869
## # … with 291 more rows
## #
## # Edge Data: 3,319 x 2
## from to
## <int> <int>
## 1 1 2
## 2 1 4
## 3 1 5
## # … with 3,316 more rows
ego_mutuals_info <- lookup_users(as_tibble(mutual_network)$name, token = sample(token), 1)
ego_mutuals_info <- ego_mutuals_info %>%
filter(!protected) %>%
select(
user_id, screen_name, lang, name, location, description,
ends_with("count"), -starts_with("quote"),
-starts_with("retweet"), -reply_count,
-starts_with("fav")
) %>%
rename(name = user_id, user_name = name)
mutual_network <- mutual_network %>%
inner_join(ego_mutuals_info) %>%
rename(name = screen_name, user_id = name) %>%
select(name, everything())
## Estadísticas descriptivas
mutual_network <- mutual_network %>%
mutate(
degree = centrality_degree(),
betweenness = centrality_betweenness(directed = TRUE),
authority_score = centrality_authority(),
eigen_centrality = centrality_eigen(directed = TRUE)
)
La siguiente gráfica muestra la influencia de cada usuario en Twitter (eje horizontal) vs la influencia de cada usuario dentro de la red personal de amigos (eje vertical)
mutual_network %>%
as_tibble() %>%
ggplot(aes(followers_count, degree)) +
geom_point()
mutual_network %>%
as_tibble() %>%
mutate(label_name = ifelse(
test = rank(-followers_count) <= 15 | rank(-degree) <= 15,
yes = name,
no = NA_character_)
) %>%
ggplot(aes(followers_count, degree)) +
geom_point() +
ggrepel::geom_label_repel(aes(label = label_name), size = 3)
Clusters
clusters <- igraph::cluster_louvain(graph = mutual_network)
cluster_df <- tibble(cluster = factor(clusters$membership), name = clusters$names)
cluster_df <- cluster_df %>%
group_by(cluster) %>%
filter(n() >= 10) %>%
ungroup()
mutual_network <- mutual_network %>%
left_join(cluster_df)
mutual_network %>%
as_tibble() %>%
arrange(desc(degree)) %>%
filter(!is.na(cluster)) %>%
group_by(cluster) %>%
filter(rank(-authority_score) <= 30) %>%
ggplot(aes(label = name, size = log(degree), color = degree)) +
geom_text_wordcloud_area(family = "Avenir Next Condensed") +
facet_wrap(~cluster) +
labs(title = "Usuarios prominentes en cada cluster") +
scale_color_gradient(low = "grey", high = "purple")
Tamaño de cada cluster:
mutual_network %>% as_tibble() %>% count(cluster)
## # A tibble: 5 x 2
## cluster n
## <fct> <int>
## 1 1 65
## 2 3 114
## 3 4 47
## 4 5 64
## 5 <NA> 5
¿Quiénes son los usuarios que funcionan como “puentes”?
mutual_network %>%
as_tibble() %>%
arrange(desc(betweenness))
## # A tibble: 295 x 15
## name user_id lang user_name location description followers_count
## <chr> <chr> <chr> <chr> <chr> <chr> <int>
## 1 RAKa… 572136… en Robert A… "Tierra… Colombia p… 9473
## 2 malb… 482533… es Mauricio… "Bogotá" Santandere… 39675
## 3 emil… 227530… en Emily Ma… "Athens… UGA '18; D… 450
## 4 Emil… 464906… en Emilio L… "" Ph.D candi… 2161
## 5 Marg… 491309… es Margarit… "Colomb… Estudiante… 759
## 6 gene… 141650… und (wannabe… "Palo A… Lapsed com… 18624
## 7 Majo… 878152… und María Jo… "" Profesora … 3354
## 8 Maca… 249998… es María-Cl… "Stony … Ph.D. Coca… 3802
## 9 Serg… 476898… es Sergio C… "Brookl… Conspiring… 2910
## 10 Mari… 766789… es Maria An… "Bogotá… Abogada, M… 5274
## # … with 285 more rows, and 8 more variables: friends_count <int>,
## # listed_count <int>, statuses_count <int>, degree <dbl>, betweenness <dbl>,
## # authority_score <dbl>, eigen_centrality <dbl>, cluster <fct>
cols <- c("betweenness", "degree", "followers_count", "friends_count")
mutual_network %>%
as_tibble() %>%
group_by(cluster) %>%
summarize(across(all_of(cols), mean)) %>%
arrange(desc(betweenness))
## # A tibble: 5 x 5
## cluster betweenness degree followers_count friends_count
## <fct> <dbl> <dbl> <dbl> <dbl>
## 1 3 306. 26.7 8976. 2455.
## 2 4 305. 6.96 2568. 1545.
## 3 5 177. 30.0 4158. 1441.
## 4 1 128. 19.0 696. 719.
## 5 <NA> 117. 1.8 335. 733.
mutual_network_subset <- mutual_network %>%
filter(!is.na(cluster)) %>%
mutate(
degree = centrality_degree(),
betweenness = centrality_betweenness(directed = TRUE),
authority_score = centrality_authority(),
eigen_centrality = centrality_eigen(directed = TRUE)
)
mutual_network_subset %>%
ggraph("mds") +
geom_edge_fan(alpha = 1/5, width = 1/5) +
geom_node_point(aes(fill = cluster, size = degree),
shape = 21, color = "white", show.legend = FALSE)
mutual_network_subset %>%
as_tibble() %>%
mutate(label_id = ifelse(
test = rank(-betweenness) <= 10 |rank(-degree) <= 10,
yes = name,
no = NA_character_)
) %>%
ggplot(aes(betweenness, degree, color = cluster)) +
geom_point() +
ggrepel::geom_label_repel(aes(label = label_id), size = 3)
mutual_network_subset %>%
group_by(cluster) %>%
mutate(label_name = ifelse(
test = rank(-degree) <= 5 | rank(-betweenness) <= 5,
yes = name,
no = NA_character_
)) %>%
ggraph("mds") +
geom_edge_fan(alpha = 1/5, width = 1/5) +
geom_node_point(aes(fill = cluster, size = betweenness),
shape = 21, color = "white", show.legend = FALSE) +
geom_node_label(aes(label = label_name),
repel = TRUE, alpha = 3/4, size = 3)
readLines("rtweet_functions.R") %>%
writeLines()
##
## # main functions ----------------------------------------------------------
##
## multi_get_friends <- function(u, token_list) {
##
## user_info <- lookup_users(u, token = sample(token_list, 1)[[1]])
## fc <- user_info$friends_count
## message("<<", user_info$screen_name, ">> is following ", scales::comma(fc), " users ")
##
## if (user_info$protected) stop(call. = FALSE, "The account is protected, we can't get followers.")
##
## num_queries <- ceiling(fc / 5000)
## rl <- rate_limit(token_list, "get_friends")
## rl <- validate_rate_limit(rl, "get_friends", token_list)
##
## index <- get_available_token_index(rl)
##
## # Case 0: User doesn't have any friends
##
## if (fc == 0) return(tibble(from = character(0), to = character(0)))
##
## # Case 1: Less than 5,000 friends, only call is needed
##
## if (fc <= 5e3) {
##
## friends <- get_friends(u, token = token_list[[index]])
##
## } else {
##
## # Case 2: Many calls are needed
##
## output <- vector("list", length = num_queries)
## output[[1]] <- get_friends(u, token = token_list[[index]])
##
## for (i in 2:length(output)) {
##
## rl <- validate_rate_limit(rl, "get_friends", token_list)
## index <- get_available_token_index(rl)
## output[[i]] <- get_friends(u, token = token_list[[index]], page = next_cursor(output[[i - 1]]))
##
## }
##
## friends <- bind_rows(output) %>%
## distinct()
##
## }
##
## attr(friends, "next_cursor") <- NULL
##
## friends %>%
## rename(from = user, to = user_id) %>%
## mutate(from = user_info$user_id)
##
## }
##
## multi_get_timeline <- function(u, n, token_list, home = FALSE) {
##
## message(u)
## rl <- rate_limit(token_list, "get_timeline")
## rl <- validate_rate_limit(rl, "get_timeline", token_list)
##
## index <- get_available_token_index(rl)
##
## # Case 0: User doesn't have any posts
##
## # what to do?
##
## # Should we allow to get all the timeline??? If so, mimic previous function
##
## tl <- get_timeline(u, n = n, home = home, token = token_list[[index]])
##
## return(tl)
##
## }
##
## # multi_lookup_users <- function() {
## #
## #
## # }
##
##
## # helpers -----------------------------------------------------------------
##
## validate_rate_limit <- function(rl, q, token_list) {
##
## if (is_empty(rl)) {
## message("Waiting for rate limiting update")
## Sys.sleep(60)
## rl <- rate_limit(token_list, query = q)
## validate_rate_limit(rl, q, token_list) # recursion!
##
## }
##
## if (all(rl$remaining == 0)) {
##
## message("Waiting for token reset in ", round(min(rl$reset), 1), " minutes")
## Sys.sleep(min(as.numeric(rl$reset_at - Sys.time(), units = "secs")) + 5)
## rl <- rate_limit(token_list, query = q)
## validate_rate_limit(rl, q, token_list) # recursion!
##
## }
##
## rl
##
## }
##
## get_available_token_index <- function(rl) {
##
## env <- rlang::caller_env()
## available_token <- rl$remaining > 0
## index <- which(available_token)[[1]]
## env$rl[index, ]$remaining <- rl[index, ]$remaining - 1 # this modifies the rl obj in the parent frame
## return(index)
##
## }
theme_custom
## function (base_family = "Avenir Next Condensed", fill = "white", ...) {
## theme_minimal(base_family = base_family, ...) %+replace%
## theme(plot.title = element_text(face = "bold", margin = margin(0,
## 0, 5, 0), hjust = 0, size = 13), plot.subtitle = element_text(face = "italic",
## margin = margin(0, 0, 5, 0), hjust = 0), plot.background = element_rect(fill = fill,
## size = 0), complete = TRUE, axis.title.x = element_text(margin = margin(15,
## 0, 0, 0)), axis.title.y = element_text(angle = 90,
## margin = margin(0, 20, 0, 0)), strip.text = element_text(face = "italic",
## colour = "white"), strip.background = element_rect(fill = "#4C4C4C"))
## }